Coherent follow-up of Continuous 

Gravitational- Wave candidates: minimal required 

observation time 

Miroslav Shaltev 

E-mail: miroslav . shaltev@aei .mpg . de 

Albert-Einstein-Institut, Callinstr. 38, 30167 Hannover, Germany 
LIGO-P1100172-v3 Fri Mar 2 16:52:05 2012 +0100 

Abstract. We derive two different methods to compute the minimal required integration time 
of a fully coherent follow-up of candidates produced in wide parameter space semi-coherent 
searches, such as global correlation StackSlide searches using Einstein@Home. We numerically 
compare these methods in terms of integration duration and computing cost. In a Monte Carlo 
study we confirm that we can achieve the required detection probability. 



1. Introduction 

Isolated neutron stars as potential sources of continuous gravitational waves are optimally 
studied with fully coherent matched filtering methods. These methods are not directly applicable 
to previously unknown objects due to the large parameter space that needs to be covered in 
all-sky wide parameter space searches and the related enormous computing cost pp. Advanced 
semi-coherent techniques, e.g. StackSlide searches on the distributed computing environment 
Einstein@Home [2], produce candidates that require follow-up in greatly reduced parameter 
space regions. A follow-up scheme consists of two basic stages. In the first refinement stage, 
we find the maximum-likelihood estimator and associated optimal search volume Vo- In the 
second zoom stage, we zoom in on the optimal search volume by semi-coherent or fully-coherent 
integration. In this paper we focus on a fully-coherent zoom for which we derive and discuss 
two different methods to compute the minimal required coherent integration time in order to 
distinguish real signals from noise. 

2. Properties of ^-statistic searches 

The J-"-statistic was first derived in [3J for the single detector case and generalized to multi- 
detector searches in [3] . Continuous gravitational- wave signals are monochromatic and sinusoidal 
in the frame of the gravitational- wave source and undergo phase- and amplitude modulation due 
to the rotation and orbital motion of the detector. The ^-"-statistic is analytically amplitude- 
maximized, thus the parameter space to search for signals is spanned by the remaining "Doppler 
parameters" A, namely sky position (a - right ascension, 5 - declination) and intrinsic frequency 
and frequency derivatives (/, /, /...), further referred to as spindowns. Searching for previously 
unknown objects with matched filtering implies computing matched filters for different points in 
parameter space, also referred to as templates. As realized in 0E] in the context of searches for 



gravitational waves from inspiraling binaries, a geometrical approach is best suited for optimal 
template placement and template counting. This is made possible by the introduction of a 
metric tensor on the parameter space and mismatch m 

m = 5ii AA*AA J ' + 0(AA 3 ) , (1) 

where the mismatch m measures the fractional loss of (squared) signal to noise ratio (SNR) p 2 
due to the usage of a nearby template A c with offset AA = A c — X s from the true parameters of 
a putative signal X s 

»=^M, (2) 

Ps 

with the squared SNR p 2 and p 2 obtained at point A s and A c , respectively. Given the metric, the 
problem of efficient lattice and alternative random and stochastic template-bank construction is 
studied in [TUBUS]. 

2.1. Fully- coherent search 

A fully-coherent search is the classical and most sensitive J r -statistic-based search in the case of 
unlimited available computing power or a sufficiently cheap computing cost requirement. The 
squared SNR p 2 scales linearly with the observation time T, according to the following formula: 
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where /io is the intrinsic signal amplitude, R represents the geometrical "detector response" , S 
is the one-sided noise spectral density, which is assumed constant in a narrow frequency band 
around /, and is the number of detectors [ID] . In the presence of a signal, the ^-statistic 
follows a non-central x 2 distribution with four degrees of freedom and non-centrality parameter 
p 2 . Thus the expectation value is 

E[2T s ]=4 + p 2 , (4) 

with standard deviation 

a{2F s ) = V / 2(4 + 2 /0 2) . (5) 

2.2. Semi- coherent search 

At fixed and limited computing cost a more sensitive detection statistic can be constructed from 
the incoherent combination of results obtained by coherent integration of shorter data segments. 
In particular we consider a Stack-Slide search [Til 021 03], where the statistic is the sum of the 
^-statistic over the segments: 
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This new statistic S follows a non-central x distribution with 4N degrees of freedom, thus the 
expectation value is 

E[^] =4N + pi , (7) 
where the non-centrality parameter is the sum of the squared SNRs over different segments 
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A trivial but useful reformulation of Eq. Q is in terms of average 2T = jj 2Fk and 

p 2 = TrllkPk^ namel y 

E[2T] = 4 + p 2 . (9) 



2.3. Template counting 

The number of templates sufficient to cover the search volume Vq is given by [7] 



M n = 9m- n / 2 Vn , (10) 

where 9 is the normalized thickness characterizing the geometric structure of covering, m is the 
maximum allowed mismatch , n the number of dimensions and 

V n = j d n X y/det^ , (11) 

is the metric template-bank volume with g^j the parameter space metric. This is the general 
form of the template counting formula, which is valid for arbitrary lattices and curved parameter 
spaces. In practice, using the flat metric approximation, where the metric coefficients are 
constant, we can take the determinant out of the integral. Moreover, if the parameter space 
is a n-dimensional "box", we can replace the integral over infinitesimal displacement dX by a 
product of n "search bands" AA, namely 

n 

V n = V / de^n AA * • ( 12 ) 
%=\ 

Follow-up of candidates from semi-coherent searches involves a semi-coherent metric, shown in 
I14j to be the average of the metric computed for every segment. The semi-coherent metric 
allows us to estimate the search band AA; around the follow-up candidate using the diagonal 
elements of the inverse Fisher matrix |15| 116]. i.e. 

AA; = ny/?* , (13) 

with 

F* = tip 2 , (14) 

where k defines the confidence level and g %3 is the inverse matrix to gij. In the present work 
we use an analytical semi-coherent metric first derived by Pletsch p3]. For coherent integration 
time longer than a day, but much shorter than a year, the number of sky templates at fixed 
frequency / converges to 

Kky = ^- , 15 

y m 

where te ~ 21 x 10 -3 s is the light travel time from the Earth's center to the detector |14j . The 
semi-coherent parameter space is finer than the coherent one by a refinement factor 7. Using 
the notion of refinement per direction 7„ we can also obtain the search bands from the extents 
of the fully coherent metric, namely 




AA i = K , ^. (16) 



For uniformly distributed segments of data without gaps, based on [M] the refinement factors 
can be obtained as 

7/ = 1, (17) 
7/ = y^iV 2 - 4 , (18) 
jf = ^(35N 4 - 1407V 2 + 108)/3 , (19) 
7j = ^(105^ - 12607V 6 + 5012iV 4 - 6160iV 2 + 2304)/(5iV 2 - 4) . (20) 



Finally, for simplicity of the template-bank construction, we use a hyper-cubic lattice to place 
templates, though hyper-cubic lattices are in general suboptimal, compared to better solutions, 
e.g. A* n lattice. The normalized thickness for an n-dimensional hyper-cubic grid is [7] 

n = n 11 / 2 2~ n . (21) 

The proper choice of the number of dimensions that maximizes the number of templates 

m m m eh m is: 

TV = maxA/^, . (22) 

n 

2.4- Computing cost 

In the follow-up of real candidates, especially weak signal candidates, along with the constraint 
of the total amount of available data, the computing cost constraint may limit significantly the 
feasibility of the search. Thus the computing-cost requirement is of particular interest. There are 
currently two different strategies to implement an ^-statistic search code in LIGO's reference 
software suite lalsuiteJTTj, namely the SFT-method based on short Fourier transforms of 
the data with duration Tsft [E! and the FFT-method based on barycentric resampling 
|18| . Regarding the computational cost, the FFT method is preferable, as the computational 
requirement to calculate the J-"-statistic, for a single point in the parameter space, scales only 
with logT, while the cost of the SFT algorithm scales with T. However, for historical reasons 
the SFT method is currently still more often used by LIGO/LSC [H HH EE], is well tested and 
we can use recent timing information. The computing cost of a SFT-based ^-statistic search is 

C = Mc N SFT , (23) 

where JVsft is the number of used SFTs, namely 

iVsFT = N d T/T SFT (24) 

and Co is the fundamental implementation- and hardware-specific computing constant per SFT 
and template. 

3. Minimal required observation time 

The main scope of the present work is to find the minimal required observation time that 
guarantees a certain detection probability of a putative signal buried deep in the detector 
noise at a certain confidence level by using the fully-coherent ^-statistic search technique. We 
consider two different methods to compute the required integration duration. In method 1, 
which is closely related to hypothesis testing, we use the concept of false-alarm and false- 
dismissal probability to achieve certain detection probability. This is the natural way to compute 
the required integration time. In method 2 we alternatively use the more intuitive notion 
of expectation value to find the observation duration that guarantees the required detection 
probability. 

3.1. Method 1 

In absence of a signal, the probability density function of the ^-statistic reduces to a central 
X 2 -distribution, and the false-alarm probability is given by 

pj A = / d{2F)xl{2F;V) , (25) 



where p\ A denotes single trial false-alarm probability and xlC^J 7 ; 0) is the central x 2 -distribution 
with 4 degrees of freedom. The integration of \^(2F, 0) = \Fe~- F yields 

PfA = (1 + . (26) 

The overall false-alarm probability of crossing the threshold 2Fth in A/" trials is 

m = l-(l-p} A f ^p} A M, (27) 

when p\ A M <C 1 [3 [22], thus 

p\ A =p lA /M . (28) 



We cannot solve Eq. ( 26 ) analytically, but numerical solution gives a threshold 2J r t h value. This 



allows us to numerically integrate the false-dismissal probability 

PfD (2F th ,p 2 ) = f Ft \d2F)xl{2F lP 2 ) , (29) 

J — oo 

where pm{2Fth) = 1 — Pdet> with the desired detection probability pdet and x\(2F,p 2 ) is the 
non-central ^-distribution with 4 degrees of freedom and non-centrality parameter p 2 . At fixed 
Pf A and pl^, using the above equation, we can compute a threshold SNR Pth(PfA' Pm) ■ The 
required T is such that the inequality 

plc(T)>p 2 th (Pf A ,Ph) (30) 

holds, where p 2 c (T) is the accumulated SNR due to the presence of signal in the analyzed data. 
Assuming that the follow-up search will use data of similar constant noise floor, we can rewrite 
Eq. ^ as 

PI(T) = p 2 c] ^L , (31) 

where AT is the length of one segment in the semi-coherent search using data from number 
of detectors. With the average 2F C value of the candidate, we can compute its SNR p c from Eq. 
([9]), namely 

p 2 c = E[2F c ]-4. (32) 
Substitution in the equations above yields the accumulated SNR in presence of signal 

pi = (E[2F C ] - 4) ^ , (33) 

which gives the required minimal T. 
3.2. Method 2 

Computation of the ^-statistic on data with no signal, has a certain expectation value, therefore 
we ask what is the expected maximal 2F value E[2Fj\f] in N trials in Gaussian noise, where 
Fat = max {F}^ =1 . The probability to get (N—l) values of 2F less than 2T]^ follows a binomial 
distribution, namely 

PN^Tn) = ( M \\{2FM^-aif- 1 (34) 



x 1 , 

l -MF N e-^ (l - (1 + F^e-^f- 1 . (35) 



With this we can numerically integrate the expectation value 



E[2F N ] 



d{2F N )2F NPN {2F N ) , 



(36) 



and standard deviation 



a N {2F N ) = (J d{2F M ) {2F M - E[2F M }) 2 PM (2F M ) 



1/2 



(37) 



To safely distinguish a real signal from pure noise, we can require the following inequality to 
hold: 

E[2F S ] - ha s {2F s ) > E[2F N ] + ha N {2F N ) , (38) 

where the expectation value E[2Fs] of a real signal and its standard deviation as(2Fs) are 
computed using Eqs. Q and ([5]). As all terms in inequality (38) are function of the observation 
time, this gives an alternative method to compute the minimal required integration time. Fine- 
tuning of Eq. (38) is possible through the safety parameter h, which we quantify by using 
Chebyshev's inequality. For a random variable X, with expected value E[X] and standard 
deviation a, 

P(\X - E[X}\ > ha) < 1/h 2 , (39) 
which means that at least a fraction 

p = l-l/h 2 (40) 

of the data is within h standard deviations on either side of the mean (23]. Rearranging the 
above equation yields 

h=l/y/l^p. (41) 

Having two independent random variables, 2Fs and 2Fj\f, we can label the fraction of data 
around each mean as ps and pj\f and introduce the joint probability pj = psPM- We see, that 
the same joint probability can be achieved for different combinations of p$ and pj\f. However, a 
natural choice is ps = pn, thus 

h = -y/pj- (42) 

We give a set of pj values and related h in Table [TJ 



Pj 


0.75 


0.90 


0.95 


0.99 


h 


2.73 


4.41 


6.28 


14.12 



Table 1. Joint probability pj and corresponding required h standard deviations. 



Fixing pj to some value and with this h in inequality ( 38 ) , we can compute the minimal required 



coherent observation time T, such that (38) holds. For this integration time, the joint probability 
pj becomes the separation probability p sep = pj. This is the probability, that a candidate due 
to the presence of a signal is consistent with the signal hypothesis and a candidate due to 
the noise is consistent with the noise hypothesis. Taking into account that ps = 1 — PfD and 
pj\f = 1 — pfA , we find the relation of the separation probability to the detection probability, 
namely p sep = Pdet(l - Pih), or for negligible false-alarm p dct w p sep . 



4. Method comparison 

4 ■ 1 ■ Numerical predictions 

In the following we compare the two methods to find the minimal required integration time 
described in the previous section in terms of observation duration and computing cost. We 
consider a StackSlide search with N = 205 segments of duration AT = 25 hours, each using 
data from = 2 detectors. For a hypothetical candidate with fixed Doppler parameters 
a = 1.45 rad, 5 = rad / = 185 Hz, / = —1 x 10 -9 Hz/s, we pick an average strength 



in the range 2T C 6 [5,13]. Then using Eq. (13) with k = 1 and the semi-coherent metric 



we compute the search bands associated with such a candidate. Having that, for mismatch 



m = 0.01 and a hyper-cubic lattice, we can compute the number of templates using Eq. (10) 



and the fully- coherent metric. Using method 1, requiring detection probability p^ ct = 0.9 at 



overall false-alarm probability p^ A = 0.01 using Eq. (29) we compute Pth^fA'^ffl) an< ^ * ne 
minimal required observation time Tl, which substituted in Eq. (33) with = iVS satisfies 



Eq. (30). For method 2 a separation probability equal to p^ et yields safety factor h = 4.41, see 
Table]! We label the integration time that satisfies Eq. (38) as T2 and plot both integration 
times T\(2T C ) and T2(2T C ) in Figure [l] (a) as function of 2T C . With the number of templates 
for Ti and T2 we estimate the computing cost C\ and C2 using the fundamental computing cost 
constant cq = 7 x 10~ 8 s in Eq. ( |24| ) and assuming SFTs of duration Tsft = 1800 s in Eq. (25). 



Ci(2T c ) and C2(2.F C ) are plotted in Figure [T] (b). In Figure [T] (c) we plot how the expectation 
value from a real signal grows with increasing T compared to loudest candidate from Gaussian 
noise. In this plot the candidate strength is fixed to 2T C = 8.5. 

We see that method 2 yields much longer observation time, at same candidate strength compared 
to method 1. Due to the resulting much larger number of templates, the computing cost, 
especially for weak candidates, is much higher. The inferiority of method 2 compared to method 
1 in terms of required integration duration and computing power can be explained by the ad hoc 
construction of method 2 and the use of Chebyshev's inequality, which is only a lower bound. In 
this sense method 2 is a more conservative approach, though the important information about 
false-alarm and false-dismissal probability gets lost in this framework. The computing cost of 
method 1 looks very promising even for weak candidates, however we should keep in mind that 
this is lower limit and the cost of a search with real data would most likely be much higher. 
The reason for this is that gaps in the data are direct penalty for the growth of p^ c , while p^ h 
remains unaffected. Furthermore, for very weak signals, the required integration duration may 
violate the assumption of constant sky resolution, thus we would underestimate the number of 
templates, resulting in a higher false-dismissal. 

4-2. Monte Carlo results 

To confirm the numerical predictions of method 1 we perform the following Monte Carlo studies. 
We create a set of 205 segments with duration 25 hours of Gaussian noise and draw a set of 
pulsar parameters a G (0, 2ir), 5 6 (— n/2, ir/2), cos 1 6 (—1, 1), tfr £ (0, 2ir), (fro £ (0, 2ir) at fixed 
frequency of / = 185 Hz and spindown value in the range / 6 (— //r, 0), where r = 2220 yr 
is the minimal spindown age of the source [I]. We inject a signal with the above parameters 
and intrinsic signal amplitude ho high enough to produce a candidate with expected average 
strength E[2Fs] € [12, 13]. To find the actual injected value we first do a targeted StackSlide 



search at the point of the injection. With this measured injected 2Fs value, using Eq. (13) we 
compute Fisher extents, from which we draw a random parameter point A c satisfying 

Tij AX i AX j < 1 . (43) 



The point A c is within the 1-a Fisher ellipsoid of the true signal location and becomes the 
candidate to follow up. Following the scheme for method 1 as described above, we compute 
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Figure 1. Numerical comparison between method 1 and method 2 (quantities labeled with 
1 and 2, respectively). Figure (a) shows the required coherent integration time as function of 
the strength of the candidate, (b) shows the computing cost depending on the strength of the 
candidate, (c) shows expected value of signal, noise and related h = 4.41 standard deviations 
for detection probability pdet = 0.9 of a candidate with 2T C = 8.5 . 

the minimal required coherent observation time targeting detection probability p^ et = 0.9 and 
search for the signal. After computation of 2Ts using the data with the injected signal, we 
compute 2J-tf with the same grid and integration duration using the noise only data. We claim 
"detection" whenever the loudest measured 27^$ value in the data with injected signal is higher 
than the loudest measured 2.7-jv of the noise. The result of the Monte Carlo simulations is 
as follows: in 897, out of 1000 trials, the measured 2J r s value in the data containing injected 
signal exceeds the measured 2Jjv value of the noise only data. With this the achieved detection 
probability j»det = 0.897 ± 0.023 is in accordance with the targeted detection probability p det - 

5. Discussion 

We derived two different methods to compute the minimal required coherent integration time 
in a fully-coherent ^-statistic search in the zoom stage of follow-up of candidates from a semi- 
coherent StackSlide search. By numerical comparison we showed that method 1 is superior 
to method 2 in terms of required integration duration and computing cost. We confirmed in 
a Monte Carlo study that the predicted coherent integration time is sufficient to achieve the 
desired detection probability. The results of this paper have been derived for Gaussian data 
without gaps and two detectors of equal noise floor. Further extension of this work is closely 



related to the data selection problem. 
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